[Comment-Thomas: note, to-dos are now mostly listed in margin notes on right.]

1 Introduction

1.1 Motivation

The spatialHeatmap package provides functionalities for visualizing cell-, tissue- and organ-specific data of biological assays by coloring the corresponding spatial features defined in anatomical images according to a numeric color key. The color scheme used to represent the assay values can be customized by the user. This core functionality of the package is called a spatial heatmap (SHM) plot. It is enhanced with nearest neighbor visualization tools for groups of measured items (e.g. gene modules) sharing related abundance profiles, including matrix heatmaps combined with hierarchical clustering dendrograms and network representations. The functionalities of spatialHeatmap can be used either in a command-driven mode from within R or a graphical user interface (GUI) provided by a Shiny App that is also part of this package. While the R-based mode provides flexibility to customize and automate analysis routines, the Shiny App includes a variety of convenience features that will appeal to experimentalists and other users less familiar with R. Moreover, the Shiny App can be used on both local computers as well as centralized server-based deployments (e.g. cloud-based or custom servers) that can be accessed remotely as a public web service for using spatialHeatmap’s functionalities with community and/or private data. The functionalities of the spatialHeatmap package are illustrated in Figure 1.

Functionality overview of spatialHeatmap. This figure illustrates the overview of spatialHeatmap functionality on gene expression data, which could be generalized on many other data types. The accepted data classes are `SummarizedExperiment` (SE, A1), `data frame` (A2), `vector` (A3). If exist, replicates should be aggregated. Subfigure B is the aSVG image (see [aSVG](#term) below), where target spatial features (*e.g.* frontal.cortex) are pre-annotated. The numeric values of target samples in data are mapped to matching features in aSVG and translated to colors (*e.g.* frontal.cortex), and the resulting plots are termed spatial heatmaps (SHMs, subfigure C). The color scale on the left indicates gene expression levels and the legend plot on the right labels matching spatial features. As enhancements to SHMs, target genes are visualized in the context of nearest neighbors in form of matrix heatmap (subfigure D), which are selected with correlation or distance measure. Next, network analysis is applied on all genes in matrix heatmap, and the module containing a target gene is shown as network graph (subfigure E). In addition to command-driven mode, all utilities are also combined as an interactive Shiny App, which would be convenient for non-R users.

Figure 1: Functionality overview of spatialHeatmap
This figure illustrates the overview of spatialHeatmap functionality on gene expression data, which could be generalized on many other data types. The accepted data classes are SummarizedExperiment (SE, A1), data frame (A2), vector (A3). If exist, replicates should be aggregated. Subfigure B is the aSVG image (see aSVG below), where target spatial features (e.g. frontal.cortex) are pre-annotated. The numeric values of target samples in data are mapped to matching features in aSVG and translated to colors (e.g. frontal.cortex), and the resulting plots are termed spatial heatmaps (SHMs, subfigure C). The color scale on the left indicates gene expression levels and the legend plot on the right labels matching spatial features. As enhancements to SHMs, target genes are visualized in the context of nearest neighbors in form of matrix heatmap (subfigure D), which are selected with correlation or distance measure. Next, network analysis is applied on all genes in matrix heatmap, and the module containing a target gene is shown as network graph (subfigure E). In addition to command-driven mode, all utilities are also combined as an interactive Shiny App, which would be convenient for non-R users.

jianhai-reply: legend is edited.1 Thomas-Comment: the legend text of Figure 1 should provide a brief overview of what is shown in the flowchart. Right now there is too much technical detail that lacks context. I suggest to significantly shorten it. Also, you may want to organize the flowchart with bullet labels like (A), (B), (C) and (D) that can be referenced in the text. For instance: (A) Expression Data, (B) aSVG, (C) SHM and (D) Gene Context Plots. This way there is a clear label plus a visual illustration for each component one can refer to in the text with Figure 1A or Figure 1B, etc.

As anatomical images the package supports both tissue maps from public repositories and custom images provided by the user. In general any type of image can be used as long as it can be provided in SVG (Scalable Vector Graphics) format, where the corresponding spatial features have been defined (see aSVG below). The numeric values plotted onto an SHM are usually quantitative measurements from a wide range of profiling technologies, such as microarrays, next generation sequencing (e.g. RNA-Seq and scRNA-Seq), proteomics, metabolomics, or many other small- or large-scale experiments. For convenience, several preprocessing and normalization methods for the most common use cases are included that support raw and/or preprocessed data. Currently, the main application domains of the spatialHeatmap package are numeric data sets and spatially mapped images from biological, agricultural and biomedical areas. Moreover, the package has been designed to also work with many other spatial data types, such a population data plotted onto geographic maps. This high level of flexibility is one of the unique features of spatialHeatmap. Related software tools for biological applications in this field are largely based on pure web applications2 Semantic Body Browser (Lekschas et al. 2015), and the Expression Atlas (https://www.ebi.ac.uk/gxa/home) both are able to visualize data on anatomical images and are web-based, should they be cited? (Winter et al. 2007; Waese et al. 2017) or local tools (Maag 2018; Muschelli, Sweeney, and Crainiceanu 2014) that typically lack customization functionalities. These restrictions limit users to utilizing pre-existing expression data and/or fixed sets of anatomical image collections. To close this gap for biological use cases, we have developed spatialHeatmap as a generic R/Bioconductor package for plotting quantitative values onto any type of spatially mapped images in a programmable environment and/or in an intuitive to use GUI application.

1.2 Design

The core feature of spatialHeatmap is to map the assay values (e.g. gene expression data) of one or many items (e.g. genes) measured under different conditions in form of numerically graded colors onto the corresponding cell types or tissues represented in a chosen SVG image. In the gene profiling field, this feature supports comparisons of the expression values among multiple genes by plotting their SHMs next to each other. Similarly, one can display the expression values of a single or multiple genes across multiple conditions in the same plot (Figure 3). This level of flexibility is very efficient for visualizing complicated expression patterns across genes, cell types and conditions. In case of more complex anatomical images composed of overlapping multiple layer tissues, it is important to visually expose the tissue layer of interest in the plots. To address this, several default and customizable layer viewing options are provided. They allow to hide features in the top layers by making them transparent in order to expose features below them. This transparency viewing feature is highlighted below in the mouse example (Figure 6). The SHM functionality also accepts multiple aSVGs as shown in Figure 9. This is particularly useful when map data to tissues across multiple development phases. In addition to static representation, SHMs are also presented in form of animation and video in order of genes or conditions. In the animation, each SHM is interactive such as zoom in and out, mouse over features to show tooltip.

To maximize reusability and extensibility, the package organizes large-scale omics assay data along with the associated experimental design information in a SummarizedExperiment object. The latter is one of the core S4 classes within the Bioconductor ecosystem that has been widely adapted by many other software packages dealing with gene-, protein- and metabolite-level profiling data (M. Morgan et al. 2018). In case of gene expression data, the assays slot of the SummarizedExperiment container is populated with a gene expression matrix, where the rows and columns represent the genes and tissue/conditions, respectively, while the colData slot contains metadata including replicate information. The tissues and/or cell type information in the object maps via colData to the corresponding features in the SVG images using unique identifiers for the spatial features (e.g. tissues or cell types). This allows to color the features of interest in an SVG image according to the numeric data stored in a SummarizedExperiment object. For simplicity the numeric data can also be provided as numeric vectors or data.frames. This can be useful for testing purposes and/or the usage of simple data sets that may not require the more advanced features of the SummarizedExperiment class, such as measurements with only one or a few data points. Details about how to access the SVG images and properly format the associated expression data are provided in the Supplement section of this vignette.

1.3 Image Format: SVG

SHMs are images where colors encode numeric values in features of any shape. For plotting SHMs, Scalable Vector Graphics (SVG) has been chosen as image format since it is a flexible and widely adapted vector graphics format that provides many advantages for computationally embedding numerical and other information in images. SVG is based on XML formatted text describing all components present in images, including lines, shapes and colors. In case of biological images suitable for SHMs, the shapes often represent anatomical or cell structures. To assign colors to specific features in SHMs, annotated SVG (aSVG) files are used where the shapes of interest are labeled according to certain conventions so that they can be addressed and colored programmatically. SVGs and aSVGs of anatomical structures can be downloaded from many sources including the repositories described below. Alternatively, users can generate them themselves with vector graphics software such as Inkscape. Typically, in aSVGs one or more shapes of a feature of interest, such as the cell shapes of an organ, are grouped together by a common feature identifier. Via these group identifiers one or many feature types can be colored simultaneously in an aSVG according to biological experiments assaying the corresponding feature types with the required spatial resolution. Correct assignment of image features and assay results is assured by using for both the same feature identifiers. The color gradient used to visually represent the numeric assay values is controlled by a color gradient parameter. To visually interpret the meaning of the colors, the corresponding color key is included in the SHM plots. Additional details for properly formatting and annotating both aSVG images and assay data are provided in the Supplement section of this vignette.

1.4 Data Repositories

If not generated by the user, SHMs can be generated with data downloaded from various public repositories. This includes gene, protein and metabolic profiling data from databases, such as GEO, BAR and Expression Atlas from EMBL-EBI (Papatheodorou et al. 2018).3 jianhai: added link. A particularly useful resource, when working with spatialHeatmap, is the EBI Expression Atlas. This online service contains both assay and anatomical images. Its assay data include mRNA and protein profiling experiments for different species, tissues and conditions. The corresponding anatomical image collections are also provided for a wide range of species including animals and plants. In spatialHeatmap several import functions are provided to work with the expression and aSVG repository from the Expression Atlas directly. The aSVG images developed by the spatialHeatmap project are available in its own repository called spatialHeatmap aSVG Repository, where users can contribute their aSVG images that are formatted according to our guidlines.

1.5 Tutorial Overview

The following sections of this vignette showcase the most important functionalities of the spatialHeatmap package using as initial example a simple to understand toy data set, and then more complex mRNA profiling data from the Expression Atlas and GEO databases. First, SHM plots are generated for both the toy and mRNA expression data. The latter include gene expression data sets from RNA-Seq and microarray experiments of Human Brain, Mouse Organs, Chicken Organs, and Arabidopsis Shoots. The first three are RNA-Seq data from the Expression Atlas, while the last one is a microarray data set from GEO. Second, gene context analysis tools are introduced, which facilitate the visualization of gene modules sharing similar expression patterns. This includes the visualization of hierarchical clustering results with traditional matrix heatmaps (Matrix Heatmap) as well co-expression network plots (Network). Third, an overview of the corresponding Shiny App is presented that provides access to the same functionalities as the R functions, but executes them in an interactive GUI environment (Chang et al., n.d.; Chang and Borges Ribeiro 2018). Fourth, more advanced features for plotting customized SHMs are covered using the Human Brain data set as an example.

2 Getting Started

2.1 Installation

The spatialHeatmap package should be installed from an R (version \(\ge\) 3.6) session with the BiocManager::install command.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("spatialHeatmap")

2.2 Packages and Documentation

Next, the packages required for running the sample code in this vignette need to be loaded.

library(spatialHeatmap); library(SummarizedExperiment); library(ExpressionAtlas); library(GEOquery)

The following lists the vignette(s) of this package in an HTML browser. Clicking the corresponding name will open this vignette.

browseVignettes('spatialHeatmap')

3 Spatial Heatmaps (SHMs)

3.1 Toy Example

SHMs are plotted with the spatial_hm function. To provide a quick and intuitive overview how these plots are generated, the following uses a generalized toy example where a small vector of random numeric values is generated that are used to color features in an aSVG image. The image chosen for this example is an aSVG depicting the human brain. The corresponding image file ‘homo_sapiens.brain.svg’ is included in this package for testing purposes. The path to this image on a user's system, where spatialHeatmap is installed, can be obtained with the system.file function.

3.1.1 aSVG Image

The following commands obtain the directory of the aSVG collection and the full path to the chosen target aSVG image on a user’s system, respectively.

svg.dir <- system.file("extdata/shinyApp/example", package="spatialHeatmap")
svg.hum <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap")

To identify features of interest in annotated aSVG images, the return_feature function can be used. The following searches the aSVG images stored in dir for the query terms ‘lobe’ and ‘homo sapiens’ under the feature and species fields, respectively. The identified matches are returned as a data.frame.

feature.df <- return_feature(feature=c('lobe'), species=c('homo sapiens'), remote=FALSE, dir=svg.dir)
## Accessing features... 
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,
feature.df
##          feature             id                    SVG    parent
## 1 occipital lobe UBERON_0002021 homo_sapiens.brain.svg LAYER_EFO
## 2  parietal lobe UBERON_0001872 homo_sapiens.brain.svg LAYER_EFO
## 3  temporal lobe UBERON_0001871 homo_sapiens.brain.svg LAYER_EFO
##   index index1
## 1     9      7
## 2    10      8
## 3    26     24
fnames <- feature.df[, 1]

3.1.2 Numeric Data

The following example generates a small numeric toy vector, where the data slot contains four numbers and its name slot is populated with the three feature names obtained from the above aSVG image. In addition, a non-matching entry (here ‘notMapped’) is included for demonstration purposes. Note, the numbers are mapped to features via matching names among the numeric vector and the aSVG, respectively. Accordingly, only numbers and features with matching name counterparts can be colored in the aSVG image. Entries without name matches are indicated by a message printed to the R console, here “notMapped”. This behavior can be turned off with verbose=FALSE in the corresponding function call. In addition, a summary of the numeric assay to feature mappings is stored in the result data.frame returned by the spatial_hm function (see below).

my_vec <- sample(1:100, length(unique(fnames))+1)
names(my_vec) <- c(unique(fnames), 'notMapped')
my_vec
## occipital lobe  parietal lobe  temporal lobe      notMapped 
##             49              4             22             57

3.1.3 Plot SHM

Next, the SHM is plotted with the spatial_hm function (Figure 2). Internally, the numbers in my_vec are translated to colors based on the color key assigned to the col.com argument, and then painted onto the corresponding features in the aSVG, where the path to the image file is defined by svg.path=svg.hum. The remaining arguments used here include: ID for defining the title of the plot; ncol for setting the column-wise layout of the plot excluding the feature legend plot on the right; and height for defining the height of the SHM relative to its width. In the given example (Figure 2) only three features in my_vec (‘occipital lobe’, ‘parietal lobe’, and ‘temporal lobe’) have matching entries in the corresponding aSVG.

shm.df <- spatial_hm(svg.path=svg.hum, data=my_vec, ID='toy', ncol=1, height=0.7, sub.title.size=20)
## Syntactically valid column names are made! 
## Enrties not mapped: notMapped
SHM of human brain with toy data. The plots from left to right represent color key, SHM and legend. The colors in the first two plots depict the user provided numeric values, whereas in the legend plot they are used to map the feature labels to the corresponding spatial regions in the image.

Figure 2: SHM of human brain with toy data
The plots from left to right represent color key, SHM and legend. The colors in the first two plots depict the user provided numeric values, whereas in the legend plot they are used to map the feature labels to the corresponding spatial regions in the image.

The named numeric values in my_vec, that have name matches with the features in the chosen aSVG, are stored in the mapped_feature slot.

# The SHM and mapped features are stored in a list
names(shm.df)
## [1] "spatial_heatmap" "mapped_feature"
# Mapped features
shm.df[['mapped_feature']]
##   rowID     featureSVG value
## 1   toy occipital.lobe    49
## 2   toy  parietal.lobe     4
## 3   toy  temporal.lobe    22

3.2 Human Brain

This subsection introduces how to find cell- and tissue-specific assay data in the Expression Atlas database. After choosing a gene expression experiment, the data is downloaded directly into a user's R session. Subsequently, the expression values for selected genes can be plotted onto a chosen aSVG image with or without prior preprocessing steps (e.g. normalization). For querying and downloading expression data from the Expression Atlas database, functions from the ExpressionAtlas package are used (Keays 2019).

3.2.1 Gene Expression Data

The following example searches the Expression Atlas for expression data derived from specific tissues and species of interest, here ‘cerebellum’ and ‘Homo sapiens’, respectively.

all.hum <- searchAtlasExperiments(properties="cerebellum", species="Homo sapiens")

The search result is stored in a DFrame containing 13 accessions matching the above query. For the following sample code, the accession ‘E-GEOD-67196’ from Prudencio et al. (2015) has been chosen, which corresponds to an RNA-Seq profiling experiment of ‘cerebellum’ and ‘frontal cortex’ brain tissue from patients with amyotrophic lateral sclerosis (ALS). Details about the corresponding record can be returned as follows.

all.hum[2, ]
## DataFrame with 1 row and 4 columns
##      Accession      Species                  Type
##    <character>  <character>           <character>
## 1 E-GEOD-67196 Homo sapiens RNA-seq of coding RNA
##                                                                                                                                   Title
##                                                                                                                             <character>
## 1 Transcription profiling by high throughput sequencing of cerebellum and frontal cortex from patients of amyotrophic lateral sclerosis

The getAtlasData function allows to download the chosen RNA-Seq experiment from the Expression Atlas and import it into a RangedSummarizedExperiment object of a user's R session.

rse.hum <- getAtlasData('E-GEOD-67196')[[1]][[1]]

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.hum. The following returns only its first five rows and columns.

colData(rse.hum)[1:5, 1:5]
## DataFrame with 5 rows and 5 columns
##            AtlasAssayGroup     organism   individual
##                <character>  <character>  <character>
## SRR1927019              g1 Homo sapiens  individual1
## SRR1927020              g2 Homo sapiens  individual1
## SRR1927021              g1 Homo sapiens  individual2
## SRR1927022              g2 Homo sapiens  individual2
## SRR1927023              g1 Homo sapiens individual34
##             organism_part                       disease
##               <character>                   <character>
## SRR1927019     cerebellum amyotrophic lateral sclerosis
## SRR1927020 frontal cortex amyotrophic lateral sclerosis
## SRR1927021     cerebellum amyotrophic lateral sclerosis
## SRR1927022 frontal cortex amyotrophic lateral sclerosis
## SRR1927023     cerebellum amyotrophic lateral sclerosis

3.2.2 aSVG Image

The following example shows how to download from the previously mentioned SVG repositories an aSVG image that matches the tissues and species assayed in the gene expression data set downloaded in the previous subsection. The return_feature function queries the repository for feature- and species-related keywords, here c('frontal cortex', 'cerebellum') and c('homo sapiens', 'brain'), respectively. To return aSVGs with at least one feature and one species match,4 jianhai-comment: at least one feature cannot be guaranteed, since the SVG repo is not huge. “To return all possible matching aSVGs” is more accurate. the argument keywords.any is set to TRUE by default. When return.all=FALSE, only aSVGs matching the query keywords are returned and saved under dir. Otherwise, all aSVGs are returned regardless of the keywords. To avoid overwriting of existing SVG files, it is recommended to start with an empty target directory, here ~/test. To search a local directory for matching aSVG images, the argument setting remote=FALSE needs to be used, while specifying the path of the corresponding directory under dir. All or only matching features are returned if match.only is set to FALSE or TRUE, respectively.

dir.create('~/test') # Create empty directory
feature.df <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), keywords.any=TRUE, return.all=FALSE, dir='~/test', remote=TRUE, match.only=TRUE, desc=FALSE) # Query aSVGs
feature.df[1:8, ] # Return first 8 rows for checking
unique(feature.df$SVG) # Return all matching aSVGs

To build this vignettes according to the R/Bioconductor package requirements, the following code section uses the aSVG file instance included in the spatialHeatmap package rather than the downloaded instance from the previous example.

feature.df <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('homo sapiens', 'brain'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE)
## Accessing features... 
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,

Note, the target tissues frontal cortex and cerebellum are included in both the experimental design slot of the downloaded expression data as well as the annotations of the aSVG. This way these features can be colored in the downstream SHM plots. If necessary users can also change from within R the feature identifiers and names in an aSVG. Details on this utility are provided in the Supplement.

feature.df
##                feature             id                    SVG
## 1 middle frontal gyrus UBERON_0002702 homo_sapiens.brain.svg
## 2     cingulate cortex UBERON_0003027 homo_sapiens.brain.svg
## 3    prefrontal cortex UBERON_0000451 homo_sapiens.brain.svg
## 4       frontal cortex UBERON_0001870 homo_sapiens.brain.svg
## 5      cerebral cortex UBERON_0000956 homo_sapiens.brain.svg
## 6           cerebellum UBERON_0002037 homo_sapiens.brain.svg
##      parent index index1
## 1 LAYER_EFO     8      6
## 2 LAYER_EFO    21     19
## 3 LAYER_EFO    23     21
## 4 LAYER_EFO    24     22
## 5 LAYER_EFO    25     23
## 6 LAYER_EFO    27     25

Since the Expression Atlas supports the cross-species anatomy ontology, the corresponding UBERON identifiers are included in the id column of the data.frame returned by the above function call of return_feature (Mungall et al. 2012). This ontology is also supported by the rols Bioconductor package (L. Gatto 2019).

3.2.3 Experimental Design

For organizing experimental designs and downstream plotting purposes, it can be desirable to shorten the text in certain columns of colData. This way one can use the source data for displaying ‘pretty’ sample names in columns and legends of all downstream tables and plots, respectively, in a consistent and automated manner. To achieve this, the following example imports a ‘targets’ file that can be generated and edited by the user in a text or spreadsheet program. In the following example the target file content is used to replace the text in the colData slot with a shortened version.

The following imports a custom target file containing simplified sample labels and experimental design information.

hum.tar <- system.file('extdata/shinyApp/example/target_human.txt', package='spatialHeatmap')
target.hum <- read.table(hum.tar, header=TRUE, row.names=1, sep='\t')

Load custom target data into colData slot.

colData(rse.hum) <- DataFrame(target.hum)

A slice of the simplified colData object is shown below, where the disease column contains now shorter labels than in the original data set. Additional details for generating and using target files in spatialHeatmap are provided in the Supplement of this vignette.

colData(rse.hum)[c(1:3, 41:42), 4:5]
## DataFrame with 5 rows and 2 columns
##             organism_part     disease
##               <character> <character>
## SRR1927019     cerebellum         ALS
## SRR1927020 frontal cortex         ALS
## SRR1927021     cerebellum         ALS
## SRR1927059     cerebellum      normal
## SRR1927060 frontal cortex      normal

3.2.4 Preprocess Assay Data

The actual gene expression data of the downloaded RNA-Seq experiment is stored in the assay slot of rse.hum. Since it contains raw count data, it can be desirable to apply basic preprocessing routines prior to plotting spatial heatmaps. The following shows how to normalize the count data, aggregate replicates and then remove genes with unreliable expression responses. These preprocessing steps are optional and can be skipped if needed. For this, the expression data can be provided to the spatial_hm function directly, where it is important to assign to the sam.factor and con.factor arguments the corresponding sample and condition column names (Table 2).

For normalizing raw count data from RNA-Seq experiments, the norm_data function can be used. It supports the following pre-existing functions from widely used packages for analyzing count data in the next generation sequencing (NGS) field: calcNormFactors (CNF) from edgeR (Robinson, McCarthy, and Smyth 2010); as well as estimateSizeFactors (EST), varianceStabilizingTransformation (VST), and rlog from DESeq2 (M. I. Love, Huber, and Anders 2014). The argument norm.fun specifies one of the four internal normalizing methods: CNF, EST, VST, and rlog. If norm.fun='none', no normalization is applied. The arguments for each normalizing function are provided via a parameter.list, which is a list with named slots. For example, norm.fun='ESF' and parameter.list=list(type='ratio') is equivalent to estimateSizeFactors(object, type='ratio'). If paramter.list=NULL, the default arguments are used by the normalizing function assigned to norm.fun. For additional details, users want to consult the help file of the norm_data function by typing ?norm_data in the R console.

The following example uses the ESF normalization option. This method has been chosen mainly due to its good time performance.

se.nor.hum <- norm_data(data=rse.hum, norm.fun='ESF', data.trans='log2')
## Normalising: ESF 
##    type 
## "ratio"

Replicates are aggregated with the aggr_rep function, where the summary statistics can be chosen under the aggr argument (e.g. aggr='mean'). The columns specifying replicates can be assigned to the sam.factor and con.factor arguments corresponding to samples and conditions, respectively. For tracking, the corresponding sample/condition labels are used as column titles in the aggregated assay instance, where they are concatenated with a double underscore as separator. In addition, the corresponding rows in the colData slot are collapsed accordingly.

se.aggr.hum <- aggr_rep(data=se.nor.hum, sam.factor='organism_part', con.factor='disease', aggr='mean')
## Syntactically valid column names are made!
assay(se.aggr.hum)[1:3, ]
##                 cerebellum__ALS frontal.cortex__ALS
## ENSG00000000003        7.024054            7.091484
## ENSG00000000005        0.000000            1.540214
## ENSG00000000419        7.866582            8.002549
##                 cerebellum__normal frontal.cortex__normal
## ENSG00000000003           6.406157               7.004446
## ENSG00000000005           0.000000               1.403110
## ENSG00000000419           8.073264               7.955709

To remove unreliable expression measures, filtering can be applied. The following example eliminates genes with expression values larger than 5 (log2 space) in at least 1% of all samples (pOA=c(0.01, 5)), while retaining genes with a coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)).

se.fil.hum <- filter_data(data=se.aggr.hum, sam.factor='organism_part', con.factor='disease', pOA=c(0.01, 5), CV=c(0.3, 100), dir=NULL)
## Syntactically valid column names are made!

To inspect the results, the following returns three selected rows of the fully preprocessed data matrix (Table 1).

assay(se.fil.hum)[c(5, 733:734), ]

Table 1: Slice of fully preprocessed expression matrix.
cerebellum__ALS frontal.cortex__ALS cerebellum__normal frontal.cortex__normal
ENSG00000006047 1.134172 5.2629629 0.5377534 5.3588310
ENSG00000268433 5.324064 0.3419665 3.4780744 0.1340332
ENSG00000268555 5.954572 2.6148548 4.9349736 2.0351776

3.2.5 SHM: Single Gene

The preprocessed expression values for any gene in the assay slot of se.fil.hum can be plotted as a SHM. The following uses gene ENSG00000268433 as an example. The chosen aSVG is a depiction of the human brain where the assayed featured are colored by the corresponding expression values in se.fil.hum.

shm.df <- spatial_hm(svg.path=svg.hum, data=se.fil.hum, ID=c('ENSG00000268433'), height=0.6, legend.r=1.3)
SHM of human brain. Only cerebellum and frontal cortex are colored, because they are present in both the aSVG and the expression data. The legend plot on the right maps the feature labels to the corresponding spatial regions in the image.

Figure 3: SHM of human brain
Only cerebellum and frontal cortex are colored, because they are present in both the aSVG and the expression data. The legend plot on the right maps the feature labels to the corresponding spatial regions in the image.

The plotting instructions of the SHM along with the corresponding mapped features are stored in a list named shm.df. Its components can be accessed as follows.

names(shm.df)
## [1] "spatial_heatmap" "mapped_feature"
# Mapped features
shm.df[['mapped_feature']]
##             rowID     featureSVG condition     value
## 1 ENSG00000268433     cerebellum       ALS 5.3240638
## 2 ENSG00000268433 frontal.cortex       ALS 0.3419665
## 3 ENSG00000268433     cerebellum    normal 3.4780744
## 4 ENSG00000268433 frontal.cortex    normal 0.1340332

In the above example, the normalized expression values of gene ENSG00000268433 are colored in the frontal cortex and cerebellum, where the different conditions, here normal and ALS, are given in separate SHMs plotted next to each other (Figure 3). The color and feature mappings are defined by the corresponding color key and legend plot on the left and right, respectively.

3.2.6 SHM: Multiple Genes

SHMs for multiple genes can be plotted by providing the corresponding gene IDs under the ID argument as a character vector. The spatial_hm function will then sequentially arrange the SHMs for each gene in a single composite plot. To facilitate comparisons of expression values across genes and/or conditions, the lay.shm parameter can be assigned 'gene' or 'con', respectively. For instance, in Figure 4 the SHMs of the genes ENSG00000268433 and ENSG00000006047 are organized by condition in a horizontal view. This functionality is particularly useful when comparing gene families.

spatial_hm(svg.path=svg.hum, data=se.fil.hum, ID=c('ENSG00000268433', 'ENSG00000006047'), lay.shm='con', width=1, height=1, legend.r=1.5)
SHMs of two genes. The subplots are organized by "condition" through `lay.shm` argument.

Figure 4: SHMs of two genes
The subplots are organized by “condition” through lay.shm argument.

3.2.7 SHM: HTML and Video

There is an option to save the SHMs as HTML and video files through the argument out.dir, which is the directory to save these files. Each HTML file contains an interactive SHM, where features can be zoomed in and out, and a tooltip of feature, data, gene, condition is presented by hovering over features. The video is a collection of all SHMs in order of genes or conditions.

3.2.8 SHM: Customization

To provide a high level of flexibility, the spatial_hm contains many arguments. An overview of the main argument names and their utility is provide in Table 2.


Table 2: Argument list (partial) of ‘spatial_hm’.
argument description
svg.path Path of aSVG
data Input data of SummarizedExperiment (SE), data frame, or vector
sam.factor Applies to SE. Column name of sample replicates in colData slot. Default is NULL
con.factor Applies to SE. Column name of condition replicates in colData slot. Default is NULL
ID A character vector of row items for plotting spatial heatmaps
col.com A character vector of color components for building colour scale. Default is c(‘purple’, ‘yellow’, ‘blue’)
col.bar ‘selected’ or ‘all’, the former means use values of ID to build the colour scale while the latter use all values in data. Default is ‘selected’.
bar.width A numeric of colour bar width. Default is 0.7
data.trans ‘log2’, ‘exp2’, or NULL, ‘log2’ transforms data to log2 scale for plotting while ‘exp2’ to 2-base exponent. Default is NULL, no transformation.
tis.trans A vector of aSVG features to be transparent. Default is NULL.
width, height Two numerics of overall width and height of all subplots, ranging between 0 and 1 repsectively. Default is 1, 1.
legend.r A numeric to adjust the dimension of the legend plot. Default is 1. The larger, the higher ratio of width to height.
sub.title.size The title size of each spatial heatmap subplot. Default is 11.
lay.shm ‘gen’ or ‘con’, applies to multiple genes or conditions respectively. ‘gen’ means spatial heatmaps are organised by genes while ‘con’ organised by conditions. Default is ‘gen’
ncol The total column number of spatial heatmaps, not including legend plot. Default is 2.
sam.legend ‘identical’, ‘all’, or a vector of samples/features in aSVG to show in legend plot. ‘identical’ only shows matching features while ‘all’ shows all features.
legend.ncol, legend.nrow Two numbers of columns and rows of legend keys respectively. Default is NULL, NULL, since they are automatically set.
legend.position the position of legend keys (‘none’, ‘left’, ‘right’,‘bottom’, ‘top’), or two-element numeric vector. Default is ‘bottom’.
legend.key.size, legend.label.size The size of legend keys and labels respectively. Default is 0.5 and 8 respectively.
line.size, line.color The size and colour of all plogyon outlines respectively. Default is 0.2 and ‘grey70’ respectively.
verbose TRUE or FALSE. Default is TRUE and the aSVG features not mapped are printed to R console.
out.dir The directory to save animation and video of spatial heatmaps. Default is NULL.

3.3 Mouse Organs

This section generates an SHM plot for mouse data from the Expression Atlas. The code components are very similar to the previous Human Brain example. For brevity, the corresponding text explaining the code has been reduced to a minimum.

3.3.1 Gene Expression Data

The chosen mouse RNA-Seq data compares tissue level gene expression across mammalian species (Merkin et al. 2012). The following searches the Expression Atlas for expression data from ‘heart’ and ‘Mus musculus’.

all.mus <- searchAtlasExperiments(properties="heart", species="Mus musculus")

Among the many matching entries, accession ‘E-MTAB-2801’ will be downloaded.

all.mus[7, ]
## DataFrame with 1 row and 4 columns
##     Accession      Species                  Type
##   <character>  <character>           <character>
## 1 E-MTAB-2801 Mus musculus RNA-seq of coding RNA
##                                           Title
##                                     <character>
## 1 Strand-specific RNA-seq of nine mouse tissues
rse.mus <- getAtlasData('E-MTAB-2801')[[1]][[1]]

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.mus. The following returns only its first three rows.

colData(rse.mus)[1:3, ]
## DataFrame with 3 rows and 4 columns
##           AtlasAssayGroup     organism organism_part      strain
##               <character>  <character>   <character> <character>
## SRR594393              g7 Mus musculus         brain      DBA/2J
## SRR594394             g21 Mus musculus         colon      DBA/2J
## SRR594395             g13 Mus musculus         heart      DBA/2J

3.3.2 aSVG Image

The following example shows how to download from the previously mentioned SVG repositories an aSVG image that matches the tissues and species assayed in the gene expression data set downloaded in the previous subsection. As before the image is saved to a directory named ~/test.

if (!dir.exists('~/test')) dir.create('~/test')
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('Mus musculus'), keywords.any=TRUE, return.all=FALSE, dir='~/test', remote=TRUE, match.only=FALSE)

To build this vignettes according to the R/Bioconductor package requirements, the following code section uses the aSVG file instance included in the spatialHeatmap package rather than the downloaded instance from the previous example.

feature.df <- return_feature(feature=c('heart', 'kidney'), species=NULL, keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE) 
## Accessing features... 
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,

Return the names of the matching aSVG files.

unique(feature.df$SVG)
## [1] "gallus_gallus.svg"     "mus_musculus.male.svg"

The following first selects mus_musculus.male.svg as target aSVG, then returns the first three rows of the resulting feature.df, and finally prints the unique set of all aSVG features.

feature.df <- subset(feature.df, SVG=='mus_musculus.male.svg')
feature.df[1:3, ]
##     feature             id                   SVG        parent
## 10   kidney UBERON_0002113 mus_musculus.male.svg     LAYER_EFO
## 11    heart UBERON_0000948 mus_musculus.male.svg     LAYER_EFO
## 12 path4204       path4204 mus_musculus.male.svg LAYER_OUTLINE
##    index index1
## 10    14     12
## 11    51     49
## 12     1      1
unique(feature.df[, 1])
##  [1] "kidney"                    "heart"                    
##  [3] "path4204"                  "aorta"                    
##  [5] "circulatory system"        "blood vessel"             
##  [7] "brown adipose tissue"      "white adipose tissue"     
##  [9] "skin"                      "stomach"                  
## [11] "duodenum"                  "pancreas"                 
## [13] "spleen"                    "adrenal gland"            
## [15] "colon"                     "small intestine"          
## [17] "caecum"                    "jejunum"                  
## [19] "ileum"                     "esophagus"                
## [21] "gall bladder"              "parotid gland"            
## [23] "submandibular gland"       "lymph node"               
## [25] "parathyroid gland"         "tongue"                   
## [27] "Peyer’s patch"             "prostate gland"           
## [29] "vas deferens"              "epididymis"               
## [31] "testis"                    "seminal vesicle"          
## [33] "penis"                     "urinary bladder"          
## [35] "thymus"                    "femur"                    
## [37] "bone marrow"               "cartilage"                
## [39] "quadriceps femoris"        "spinal cord"              
## [41] "lung"                      "diaphragm"                
## [43] "peripheral nervous system" "trachea"                  
## [45] "hindlimb"                  "trigeminal nerve"         
## [47] "eye"                       "sciatic nerve"            
## [49] "intestinal mucosa"         "liver"                    
## [51] "brain"                     "skeletal muscle"

Obtain path of target aSVG on user system.

svg.mus <- system.file("extdata/shinyApp/example", "mus_musculus.male.svg", package="spatialHeatmap")

3.3.3 Experimental Design

Import of custom target file defining simplified sample labels and experimental design. The following imports a sample target file that is included in this package. To inspect its content, the first three rows of the target file are printed to the screen.

mus.tar <- system.file('extdata/shinyApp/example/target_mouse.txt', package='spatialHeatmap')
target.mus <- read.table(mus.tar, header=TRUE, row.names=1, sep='\t')
target.mus[1:3, ]
##           AtlasAssayGroup     organism organism_part strain
## SRR594393              g7 Mus musculus         brain DBA.2J
## SRR594394             g21 Mus musculus         colon DBA.2J
## SRR594395             g13 Mus musculus         heart DBA.2J
unique(target.mus[, 3])
## [1] "brain"           "colon"           "heart"          
## [4] "kidney"          "liver"           "lung"           
## [7] "skeletal muscle" "spleen"          "testis"

Load custom target data into colData slot.

colData(rse.mus) <- DataFrame(target.mus)

3.3.4 Preprocess Assay Data

The raw RNA-Seq count are preprocessed with the following steps: (1) normalization, (2) aggregation of replicates, and (3) filtering of reliable expression data. The details of these steps are explained in the previous sub-section using data from human.

se.nor.mus <- norm_data(data=rse.mus, norm.fun='ESF', data.trans='log2') # Normalization
## Normalising: ESF 
##    type 
## "ratio"
se.aggr.mus <- aggr_rep(data=se.nor.mus, sam.factor='organism_part', con.factor='strain', aggr='mean') # Aggregation of replicates
## Syntactically valid column names are made!
se.fil.mus <- filter_data(data=se.aggr.mus, sam.factor='organism_part', con.factor='strain', pOA=c(0.01, 5), CV=c(0.6, 100), dir=NULL) # Filtering of genes with low counts and variance 
## Syntactically valid column names are made!

3.3.5 SHM: Transparency

The pre-processed expression data for gene ‘ENSMUSG00000000263’ is plotted in form of an SHM. In this case the plot includes expression data for 8 tissues across 3 mouse strains.

spatial_hm(svg.path=svg.mus, data=se.fil.mus, ID=c('ENSMUSG00000000263'), height=0.5, legend.r=1.1, sub.title.size=9, ncol=3, tis.trans=c('skeletal muscle'), legend.nrow=4, line.size=0.2, line.color='grey70')
SHM of mouse organs. This is a multiple-layer image where the polygon of the 'skeletal muscle' is set transparent to expose 'lung' and 'heart'.

Figure 5: SHM of mouse organs
This is a multiple-layer image where the polygon of the ‘skeletal muscle’ is set transparent to expose ‘lung’ and ‘heart’.

The SHM plots in Figures 5 and 6 demonstrate the usage of the transparency feature via the tis.trans parameter. The corresponding mouse organ aSVG image includes overlapping tissue layers. In this case the skelectal muscle layer partially overlaps with lung and heart tissues. To view lung and heart in Figure 5, the skelectal muscle tissue is set transparent with tis.trans=c('skeletal muscle'). To view in the same aSVG the skeletal muscle tissue instead, tis.trans is assigned NULL for generating the SHM plot of Figure 6.

To fine control the visual effects in feature rich aSVGs, the line.size and line.color parameters are useful. This way one can adjust the thickness and color of complex polygon structures.

spatial_hm(svg.path=svg.mus, data=se.fil.mus, ID=c('ENSMUSG00000000263'), height=0.5, legend.r=1.1, sub.title.size=9, ncol=3, tis.trans=NULL, legend.ncol=2, line.size=0.2, line.color='grey70')
SHM of mouse organs. This is a multiple-layer image where the view onto 'lung' and 'heart' is obstructed by displaying the 'skeletal muscle' tissue.

Figure 6: SHM of mouse organs
This is a multiple-layer image where the view onto ‘lung’ and ‘heart’ is obstructed by displaying the ‘skeletal muscle’ tissue.

3.4 Chicken Organs

This section generates an SHM plot for chicken data from the Expression Atlas. The code components are very similar to the Human Brain example. For brevity, the corresponding text explaining the code has been reduced to a minimum.

3.4.1 Gene Expression Data

The chosen chicken RNA-Seq experiment compares the developmental changes across nine time points of seven organs (Cardoso-Moreira et al. 2019).

The following searches the Expression Atlas for expression data from ‘heart’ and ‘gallus’.

all.chk <- searchAtlasExperiments(properties="heart", species="gallus")

Among the matching entries, accession ‘E-MTAB-6769’ will be downloaded.

all.chk[3, ]
## DataFrame with 1 row and 4 columns
##     Accession       Species                  Type
##   <character>   <character>           <character>
## 1 E-MTAB-6769 Gallus gallus RNA-seq of coding RNA
##                                                                  Title
##                                                            <character>
## 1 Chicken RNA-seq time-series of the development of seven major organs
rse.chk <- getAtlasData('E-MTAB-6769')[[1]][[1]]

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.chk. The following returns only its first three rows.

colData(rse.chk)[1:3, ]
## DataFrame with 3 rows and 8 columns
##            AtlasAssayGroup      organism         strain
##                <character>   <character>    <character>
## ERR2576379              g1 Gallus gallus Red Junglefowl
## ERR2576380              g1 Gallus gallus Red Junglefowl
## ERR2576381              g2 Gallus gallus Red Junglefowl
##                      genotype developmental_stage         age
##                   <character>         <character> <character>
## ERR2576379 wild type genotype              embryo      10 day
## ERR2576380 wild type genotype              embryo      10 day
## ERR2576381 wild type genotype              embryo      10 day
##                    sex organism_part
##            <character>   <character>
## ERR2576379      female         brain
## ERR2576380      female         brain
## ERR2576381      female    cerebellum

3.4.2 aSVG Image

The following example shows how to download from the previously mentioned SVG repositories an aSVG image that matches the tissues and species assayed in the gene expression data set downloaded in the previous subsection. As before the image is saved to a directory named ~/test.

# Make an empty directory "~/test" if not exist.
if (!dir.exists('~/test')) dir.create('~/test')
# Query aSVGs.
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), keywords.any=TRUE, return.all=FALSE, dir='~/test', remote=TRUE, match.only=FALSE)

To build this vignettes according to the R/Bioconductor package requirements, the following code section uses the aSVG file instance included in the spatialHeatmap package rather than the downloaded instance from the previous example.

feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)
## Accessing features... 
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,
feature.df
##                 feature              id               SVG
## 1                 heart  UBERON_0000948 gallus_gallus.svg
## 2                kidney  UBERON_0002113 gallus_gallus.svg
## 3       chicken_outline chicken_outline gallus_gallus.svg
## 4                 brain  UBERON_0000955 gallus_gallus.svg
## 5                 liver  UBERON_0002107 gallus_gallus.svg
## 6 skeletal muscle organ  UBERON_0014892 gallus_gallus.svg
## 7                 colon  UBERON_0001155 gallus_gallus.svg
## 8                spleen  UBERON_0002106 gallus_gallus.svg
## 9                  lung  UBERON_0002048 gallus_gallus.svg
##          parent index index1
## 1     LAYER_EFO     4      2
## 2     LAYER_EFO     5      3
## 3 LAYER_OUTLINE     1      1
## 4     LAYER_EFO     3      1
## 5     LAYER_EFO     6      4
## 6     LAYER_EFO     7      5
## 7     LAYER_EFO     8      6
## 8     LAYER_EFO     9      7
## 9     LAYER_EFO    10      8

Obtain path of target aSVG on user system.

svg.chk <- system.file("extdata/shinyApp/example", "gallus_gallus.svg", package="spatialHeatmap")

3.4.3 Experimental Design

Import of custom target file defining simplified sample labels and experimental design. The following imports a sample target file that is included in this package. To inspect its content, the first three rows of the target file are printed to the screen.

chk.tar <- system.file('extdata/shinyApp/example/target_chicken.txt', package='spatialHeatmap')
target.chk <- read.table(chk.tar, header=TRUE, row.names=1, sep='\t')
target.chk[1:3, ]
##            AtlasAssayGroup      organism         strain
## ERR2576379              g1 Gallus gallus Red Junglefowl
## ERR2576380              g1 Gallus gallus Red Junglefowl
## ERR2576381              g2 Gallus gallus Red Junglefowl
##                      genotype developmental_stage   age    sex
## ERR2576379 wild type genotype              embryo day10 female
## ERR2576380 wild type genotype              embryo day10 female
## ERR2576381 wild type genotype              embryo day10 female
##            organism_part
## ERR2576379         brain
## ERR2576380         brain
## ERR2576381    cerebellum

Load custom target data into colData slot.

colData(rse.chk) <- DataFrame(target.chk)

All samples used for plotting SHMs.

unique(colData(rse.chk)[, 'organism_part'])
## [1] "brain"      "cerebellum" "heart"      "kidney"    
## [5] "ovary"      "testis"     "liver"

Return conditions considered for plotting downstream SHM.

unique(colData(rse.chk)[, 'age'])
## [1] "day10"  "day12"  "day14"  "day17"  "day0"   "day155"
## [7] "day35"  "day7"   "day70"

3.4.4 Preprocess Assay Data

The raw RNA-Seq count are preprocessed with the following steps: (1) normalization, (2) aggregation of replicates, and (3) filtering of reliable expression data. The details of these steps are explained in the above sub-section using data from human.

se.nor.chk <- norm_data(data=rse.chk, norm.fun='ESF', data.trans='log2') # Normalization
## Normalising: ESF 
##    type 
## "ratio"
se.aggr.chk <- aggr_rep(data=se.nor.chk, sam.factor='organism_part', con.factor='age', aggr='mean') # Replicate agggregation using mean 
se.fil.chk <- filter_data(data=se.aggr.chk, sam.factor='organism_part', con.factor='age', pOA=c(0.01, 5), CV=c(0.6, 100), dir=NULL) # Filtering of genes with low counts and varince

3.4.5 SHM: Time Series

The expression profile for gene ENSGALG00000006346 is plotted across nine time points in four organs in form of a composite SHM with 9 panels. Their layout in three columns is controlled with the argument setting ncol=3.

jianhai-reply: reordered.5 The order of the time points in Figure 7 should be by time. Right now it is rather random.

spatial_hm(svg.path=svg.chk, data=se.fil.chk, ID='ENSGALG00000006346', legend.r=1.5, sub.title.size=9, ncol=3, legend.nrow=2)
## Enrties not mapped: cerebellum, ovary, testis
Time course of chicken organs. The SHM shows the expression profile of a single gene across nine time points and four organs.

Figure 7: Time course of chicken organs
The SHM shows the expression profile of a single gene across nine time points and four organs.

3.5 Arabidopsis Shoot

This section generates an SHM for Arabidopsis thaliana with gene expression data from the Affymetrix microarray technology. The chosen experiment used ribosome-associated mRNAs from several cell populations of shoots and roots that were exposed to hypoxia stress (Mustroph et al. 2009). In this case the expression data will be downloaded from GEO with utilites from the GEOquery package (S. Davis and Meltzer 2007) and the preprocessing routines will be specific to the Affymetrix technology. The remaining code components for generating SHMs are very similar to the previous examples. For brevity, the text in this section explains mainly the steps that are specific to this data set.

3.5.1 Gene Expression Data

The GSE14502 data set is downloaded with the getGEO function from the GEOquery package. Intermediately, the expression data is stored in an ExpressionSet container (W. Huber et al. 2015) and then converted to a SummarizedExperiment object.

gset <- getGEO("GSE14502", GSEMatrix=TRUE, getGPL=TRUE)[[1]]
se.sh <- as(gset, "SummarizedExperiment")

The gene symbol identifiers are extracted from the rowData component and used as row names. Similarly, one can work with AGI identifiers by providing below AGI under Gene.Symbol.

rownames(se.sh) <- make.names(rowData(se.sh)[, 'Gene.Symbol'])

The following returns a slice of the experimental design stored in the colData slot. Both the samples and conditions are contained in the title column. The samples include promoters (pGL2, pCO2, pSCR, pWOL, p35S), tissues and organs (root atrichoblast epidermis, root cortex meristematic zone, root endodermis, root vasculature, root_total and shoot_total); and the conditions are control and hypoxia.

colData(se.sh)[60:63, 1:4]
## DataFrame with 4 rows and 4 columns
##                              title geo_accession
##                        <character>   <character>
## GSM362227  shoot_hypoxia_pGL2_rep1     GSM362227
## GSM362228  shoot_hypoxia_pGL2_rep2     GSM362228
## GSM362229 shoot_control_pRBCS_rep1     GSM362229
## GSM362230 shoot_control_pRBCS_rep2     GSM362230
##                          status submission_date
##                     <character>     <character>
## GSM362227 Public on Oct 12 2009     Jan 21 2009
## GSM362228 Public on Oct 12 2009     Jan 21 2009
## GSM362229 Public on Oct 12 2009     Jan 21 2009
## GSM362230 Public on Oct 12 2009     Jan 21 2009

3.5.2 aSVG Image

In this example, the aSVG image has been regenerated in Inkscape from the corresponding figure in Mustroph et al. (2009). The resulting custom figure has been included as a sample aSVG file in the spatialHeatmap package. Detailed instructions for generating custom aSVG images in Inkscape are provided in the SVG tutorial.

The annotations in the corresponding aSVG file located under svg.dir can be queried with the return_features function.

feature.df <- return_feature(feature=c('pGL2', 'pRBCS'), species=c('shoot'), keywords.any=TRUE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)
## Accessing features... 
## arabidopsis_thaliana.root_cross.svg, gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,

The unique set of the matching aSVG files can be returned as follows.

unique(feature.df$SVG)
## [1] "shoot_final.svg"      "shoot_root_final.svg"

The aSVG file arabidopsis_thaliana.shoot_shm.svg is chosen to generate the SHM in this section.

feature.df <- subset(feature.df, SVG=='arabidopsis_thaliana.shoot_shm.svg')
feature.df[1:3, ]
##      feature   id  SVG parent index index1
## NA      <NA> <NA> <NA>   <NA>    NA     NA
## NA.1    <NA> <NA> <NA>   <NA>    NA     NA
## NA.2    <NA> <NA> <NA>   <NA>    NA     NA

Obtain full path of target aSVG on user system.

svg.sh <- system.file("extdata/shinyApp/example", "arabidopsis_thaliana.shoot_shm.svg", package="spatialHeatmap")

3.5.3 Experimental Design

The following imports a sample target file that is included in this package. To inspect its content, four selected rows of this target file are printed to the screen.

sh.tar <- system.file('extdata/shinyApp/example/target_arab.txt', package='spatialHeatmap')
target.sh <- read.table(sh.tar, header=TRUE, row.names=1, sep='\t')
target.sh[60:63, ]
##                           col.name     samples conditions
## shoot_hypoxia_pGL2_rep1  GSM362227  shoot_pGL2    hypoxia
## shoot_hypoxia_pGL2_rep2  GSM362228  shoot_pGL2    hypoxia
## shoot_control_pRBCS_rep1 GSM362229 shoot_pRBCS    control
## shoot_control_pRBCS_rep2 GSM362230 shoot_pRBCS    control

Return all samples present in target file.

unique(target.sh[, 'samples'])
##  [1] "root_total"      "root_p35S"       "root_pSCR"      
##  [4] "root_pSHR"       "root_pWOL"       "root_pGL2"      
##  [7] "root_pSUC2"      "root_pSultr2.2"  "root_pCO2"      
## [10] "root_pPEP"       "root_pRPL11C"    "shoot_total"    
## [13] "shoot_p35S"      "shoot_pGL2"      "shoot_pRBCS"    
## [16] "shoot_pSUC2"     "shoot_pSultr2.2" "shoot_pCER5"    
## [19] "shoot_pKAT1"

Return all conditions present in target file.

unique(target.sh[, 'conditions'])
## [1] "control" "hypoxia"

Load custom target data into colData slot.

colData(se.sh) <- DataFrame(target.sh)

3.5.4 Preprocess Assay Data

The downloaded GSE14502 data set has already been normalized with the RMA algorithm (Gautier et al. 2004). Thus, the pre-processing steps can be restricted to the aggregation of replicates and filtering of reliably expressed genes. For the latter, the following code will retain genes with expression values larger than 6 (log2 space) in at least 3% of all samples (pOA=c(0.03, 6)), and with a coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)).

se.aggr.sh <- aggr_rep(data=se.sh, sam.factor='samples', con.factor='conditions', aggr='mean') # Replicate agggregation using mean
se.fil.arab <- filter_data(data=se.aggr.sh, sam.factor='samples', con.factor='conditions', pOA=c(0.03, 6), CV=c(0.30, 100), dir=NULL) # Filtering of genes with low intensities and variance

3.5.5 SHM: Microarray Example

The expression profile for the HRE2 gene is plotted for control and hypoxia treatment across six cell types (Figure 8).

spatial_hm(svg.path=svg.sh, data=se.fil.arab, ID=c("HRE2"), height=0.6, legend.nrow=3, legend.r=1.3, legend.key.size=0.3)
## Enrties not mapped: root_total, root_p35S, root_pSCR, root_pSHR, root_pWOL, root_pGL2, root_pSUC2, root_pSultr2.2, root_pCO2, root_pPEP, root_pRPL11C, shoot_total, shoot_p35S
SHM of Arabidopsis shoots. The expression profile of the HRE2 gene is plotted for control and hypoxia treatment across six cell types.

Figure 8: SHM of Arabidopsis shoots
The expression profile of the HRE2 gene is plotted for control and hypoxia treatment across six cell types.

3.5.6 SHM: Growth-stage Example

The spatial_hm function also accepts multiple aSVG files, such as aSVGs of different growth stages. In this case, these aSVG files should be indexed with suffixes of _shm1, _shm2, etc, such as arabidopsis_thaliana.organ_shm1.svg and arabidopsis_thaliana.organ_shm2.svg in the following example. Without such suffixes, the order of aSVGs cannot be determined and an error will be reported. The paths of aSVG files should be provided to svg.path as a vector. To preserve the relative dimensions of original aSVGs, the argument preserve.scale should be set TRUE, which is useful if tissue/organ features need to reflect varying sizes across growth stages. By default, every aSVG image will have a legend plot on the right. If only specific legend plots are expected, the legend argument should be used, and accepted values are shm1, shm2, etc. The legend.width argument controls the width of legend plots. If a value of 0 is provided, all legend plots are removed.

In the example below, a random numeric data frame is generated and gene1 is mapped to two aSVGs of Arabidopsis plants, arabidopsis_thaliana.organ_shm1.svg and arabidopsis_thaliana.organ_shm2.svg, which are pre-packaged in spatialHeatmap. The former stands for a younger stage while the latter for an older stage. The samples shoot_totalA and shoot_totalB in the data frame are assumed to have matching feature counterpart in the younger and older plant aSVG respectively.

Create random numeric data.frame.

df.random <- data.frame(matrix(sample(x=1:100, size=50, replace=TRUE), nrow=10))
colnames(df.random) <- c('shoot_totalA__condition1', 'shoot_totalA__condition2', 'shoot_totalB__condition1', 'shoot_totalB__condition2', 'notMapped') # Assign column names
rownames(df.random) <- paste0('gene', 1:10) # Assign row names 
df.random[1:3, ]
##       shoot_totalA__condition1 shoot_totalA__condition2
## gene1                       23                       78
## gene2                       84                       44
## gene3                       66                        1
##       shoot_totalB__condition1 shoot_totalB__condition2
## gene1                       91                       85
## gene2                       22                       58
## gene3                       77                       83
##       notMapped
## gene1        12
## gene2        63
## gene3        12

Access aSVG paths of the younger and older plants.

# aSVG of growth stage 1 (younger).
svg.sh1 <- system.file("extdata/shinyApp/example", "arabidopsis_thaliana.organ_shm1.svg", package="spatialHeatmap")
# aSVG of growth stage 2 (older).
svg.sh2 <- system.file("extdata/shinyApp/example", "arabidopsis_thaliana.organ_shm2.svg", package="spatialHeatmap")

Plot SHMs on gene1. The orginal dimension relativity is preserved by preserve.scale=TRUE.

spatial_hm(svg.path=c(svg.sh1, svg.sh2), data=df.random, ID=c('gene1'), width=0.7, legend.r=0.9, legend.width=0.9, preserve.scale=TRUE) 
## Enrties not mapped: shoot_totalB, notMapped 
## Enrties not mapped: shoot_totalA, notMapped
SHMs of Arabidopsis at two growth stages. The expression profile of `gene1` under `condition1` and `condition2` is mapped to two growth stages.

Figure 9: SHMs of Arabidopsis at two growth stages
The expression profile of gene1 under condition1 and condition2 is mapped to two growth stages.

4 Matrix Heatmaps

SHMs are a visualization approach suitable for comparing the expression profiles of single genes or a small number of them across cell types and conditions. To also support analyses across larger number of genes, spatialHeatmap provides utilities for identifying for gene(s) of interest nearest neighbor genes that share similar expression profiles. This is achieved by identifying clusters and network modules using hierarchical clustering and network analysis, and the former is displayed as matrix heatmap while the latter as network graph . These approaches are described with the Arabidopsis Shoot data in this and the following sections, respectively.

The nearest neighbors of target genes are selected based on the correlation coefficient or distance between genes, where the default is Pearson correlation coefficient (PCC). The argument p is the proportion of genes showing most similar expression profiles with target genes. Only genes within this proportion are returned. Other two alternative threshold arguments are n and v. The former selects top n most similar genes, while the latter is a real PCC or distance value and only genes within this value are selected. Note, the nearest neighbors are selected for each target gene independently. In this example, the target genes are RCA and HRE2. The argument ann is the column name of gene annotation in rowData slot. It is only relevant if users want to see annotation when mousing over a node in the interactive network below, so it is optional to be assigned a value. Here ann='Target.Description' is set and the corresponding annotation is appended to selected nearest neighbors.

sub.mat <- submatrix(data=se.fil.arab, ann='Target.Description', ID=c('RCA', 'HRE2'), p=0.1)

The subsetted matrix (target genes and nearest neighbors) and the complete PCC matrix are returned in a list, and partial is shown below.

sub.mat[['sub_matrix']][c('RCA', 'HRE2'), c(1:3, 37)] # Subsetted data matrix.
##      root_total__control root_total__hypoxia root_p35S__control
## RCA             6.569305            6.416811           7.443822
## HRE2            5.486920           11.370161           5.578123
##                                                    Target.Description
## RCA  hypothetical protein ;supported by full-length cDNA: Ceres:7114.
## HRE2                         putative AP2 domain transcription factor
sub.mat[['cor_dist']][1:3, 1:3] # Correlation matrix.
##           ndhA      petL      psaJ
## ndhA 1.0000000 0.8991317 0.9189853
## petL 0.8991317 1.0000000 0.9585802
## psaJ 0.9189853 0.9585802 1.0000000

Hierarchical clustering is applied on the subsetted matrix and results are displayed as matrix heatmap in either static or interactive mode. Figure 10 is the static mode on gene RCA and HRE2. The rows (genes) and columns (samples) are sorted by hierarchical clustering dendrograms. Target genes are labeled by black lines. It is manifest that the two target genes belong to two separate patterns respectively. Setting static=FALSE launches the interactive mode, where users can zoom in for details by drawing a rectangle, and zoom out by double clicking the heatmap.

matrix_hm(ID=c('RCA', 'HRE2'), data=sub.mat[['sub_matrix']], angleCol=80, angleRow=35, cexRow=0.8, cexCol=0.8, margin=c(10, 6), static=TRUE, arg.lis1=list(offsetRow=0.01, offsetCol=0.01))
Matrix Heatmap. Rows are genes and columns are samples. The input genes are tagged by black lines.

Figure 10: Matrix Heatmap
Rows are genes and columns are samples. The input genes are tagged by black lines.

5 Network Graphs

In this section, network analysis is applied on the subsetted gene expression matrix from last section. It includes network module identification and visualization of modules in form of network graphs.

5.1 Adjacency Matrix and Module Identification

jianhai-reply: the blue text are responses to ThG-Comments.

[ThG-Comment: this section needs major improvements. The intent was to identify for a target gene nearest neighbors based on some similarity metric for gene expression profiles (e.g. correlation), and then hierarchically cluster them and visualize the result in a heatmap. Instead you are subjecting the entire expression matrix to clustering/WGCNA and then plot the target gene in the context of a single cluster/module. This is hugely inefficient compared to doing it only for the nearest neighbors instead. Just think about what will happen if a user imports an expression matrix for all transcripts from human. With your approach they will not be able to complete this step since they will run out of memory on their system.]

The WGCNA network analysis (Langfelder and Horvath 2008) is applied on the subsetted gene expression matrix to identify gene modules, which are clusters of genes showing highly similar expression profiles. First, a correlation matrix or distance matrix is computed on the subsetted matrix, which defaults to Pearson correlation or euclidean distance respectively. Then the computed matrix is transformed to an adjacency matrix. The adjacency matrix is characterized by sale-free topology, where only a small number of genes are expected to have many connections with other genes and thus dominate the whole network (Ravasz et al. 2002). Values in this adjacency matrix denotes expression similarity between genes. The larger value, the higher similarity.6 You need to describe how this adjency matrix is generated and what it represents. It also needs to mention what type of disance/correlation method is used initially to generate the downstream matrix.7 Without defining how adjacency is defined here readers will not be able to understand what was done here, especially since this section is mainly about hierarchical clustering.

Next, the adjacency matrix is used to calculate an advanced similarity measure of topological overlap matrix (TOM), which quantifies expression similarity for a pair of genes in the context of all other genes. Then, the TOM transformed distance matrix 1-TOM is used for hierarchical clustering with flashClust (Langfelder and Horvath 2012). Lastly, network modules are identified with the dynamicTreeCut package (Langfelder, Zhang, and Steve Horvath 2016). It provides an argument deepSplit (ds) for rough control over sensitivity to module splitting. The ds has four alternative values 0, 1, 2, 3, and each generates an alternative set of modules. The higher the value, the more and smaller modules are produced, thus there are four alternative sets of modules generated with varying sizes.8 What is the meaning of the sensitivity levels, how are they generated and how should the user interpret them?9 Unclear for what you need WGCNA here. I thought this section uses hierarchical clustering. The latter is certainly used by WGCNA but it remains a mystery what exactly you are doing here. I would expect to use here only hierarchical clustering, e.g. with flashClust. WGCNA seems more suitable for the next section.

The whole process above of identifying modules is implemented in the function adj_mod. It returns a list containing the adjacency matrix and a data frame of module assignment. Since the interactive network below performs better on smaller modules, only modules resulting from ds=2 and ds=3 are returned.

adj.mod <- adj_mod(data=sub.mat[['sub_matrix']])

Partial of the adjacency matrix is shown below.

adj.mod[['adj']][1:3, 1:3]
##                 CA1    PSAH.1 AT2G26500
## CA1       1.0000000 0.9514016 0.9636366
## PSAH.1    0.9514016 1.0000000 0.9611725
## AT2G26500 0.9636366 0.9611725 1.0000000

The module assignment is a data frame. The first column is ds=2 while the second is ds=3. The numbers in each column are module labels with 0 meaning genes not assigned to any modules. First three rows are shown below.

adj.mod[['mod']][1:3, ] 
##           2 3
## CA1       1 1
## PSAH.1    1 1
## AT2G26500 1 1

[ThG-Comment: both the previous section and this section need to clearly explain what is done to arrive at a given network module and what it represents. Once this is done I will edit the text.]

5.2 Module Visualization

To visualize modules, a target gene should be selected first, usually a gene from the SHMs. Then the module containing the target gene is internally selected based on the ds, which defaults to 3. There are two modes to visualize the selected module, static or interactive. Figure 11 is the static network containing gene HRE2, which is generated by static=TRUE. Nodes are genes and edges are adjacencies between genes. The thicker edge denotes higher adjacency while larger node indicates higher gene connectivity (sum of a gene's adjacency with all its direct neighbours). The target gene is labeled by ’_target’.

network(ID="HRE2", data=sub.mat[['sub_matrix']], adj.mod=adj.mod, adj.min=0.90, vertex.label.cex=1.2, vertex.cex=2, static=TRUE)
Static network. Node size denotes gene connectivity while edge thickness stands for co-expression similarity.

Figure 11: Static network
Node size denotes gene connectivity while edge thickness stands for co-expression similarity.

Setting static=FALSE launches the interactive network. There is an interactive color bar to denote gene connectivity. The color ingredients must only be separated by comma, e.g. purple,yellow,blue, which means gene connectivity increases from purple to yellow. If too many edges (e.g.: > 300) are displayed, the network could get stuck. So the ‘Adjacency threshold’ option sets a threthold to filter out weak edges. If not too many edges retained (e.g.: < 300), users can check ‘Yes’ under ‘Show plot’, then the network would be responsive smoothly. To maintain acceptable performance, users are advised to choose a stringent threshold (e.g. 0.9) initially, then decrease the value gradually. The interactive feature allows users to zoom in and out, or drag a gene around. All the gene IDs in the network module are listed in ‘Select by id’ in decreasing order according to gene connectivity. Same with static mode, the target gene ID is appended ’_target’.

If gene annotation is appended as a column in the subsetted expression matrix, it will be detected by the function network automatically, and is seen by mousing over a node.

network(ID="HRE2", data=sub.mat[['sub_matrix']],  adj.mod=adj.mod, static=FALSE)

6 Shiny App

In additon to generating SHMs and the corresponding gene context plots from R, spatialHeatmap includes a Shiny App that provides access to all the functionalities from an intuitive-to-use web browser interface. Apart from being very user-friendly, this App conveniently organizes the results of the entire visualization workflow in a single browser window with options to adjust the parameters of the individual components interactively. For instance, genes can be selected and replotted in the SHM simply by clicking the corresponding rows in the expression table included in the same window. The subplots of SHMs, including multiple growth stages, are presented in three modes “Basic”, “Animation”, and “Video”. The “Basic” compiles all subplots in the same page, and the dimension, layout, and color key are all customizable, the “Animation” displays each subplot sequentially in an animation. The animation can be played continuously or paused at a frame. Each frame can be zoomed in and out by drawing a rectangle and clicking the “home” icon in the top-right toolbar respectively. When mouse over a shape, the information of gene, condition, feature identifier, and value are displayed in a tootip, while the “Video” assembles all SHMs in am MP4 file. This representation is very efficient in guiding the interpretation of the results in a visual and user-friendly manner. For testing purposes, the spatialHeatmap Shiny App also includes ready-to-use sample expression data and aSVG images along with embedded user instructions.

6.1 Local System

The Shiny App of spatialHeatmap can be launched from an R session with the following function call.

shiny_all()

[ThG-Comment: many menu items are not functional in the Shiny App. E.g.: Instructions and Acknowledgement return error messages.] jianhai-reply: Missing pre-uploaded files are added, and now it is functional.

The dashboard panels of the Shiny App are organized as follows:

  1. Left Side Panel: menu for input files and instruction
  2. Data Matrix: scrollable data matrix uploaded by users
  3. Spatial Heatmap: spatially colored images based on aSVG image provided by users
  4. Matrix Heatmap: Matrix Heatmap: hierarchical clustering results of target items and nearest neighbors showing most similar abundance profiles
  5. Network Graph: interactive network module containing a specific target item

The parameters for controlling each functionality are placed on top of respective panel.

A screenshot of the Spatial Heatmap component within the Shiny App window is shown below (Figure 12).

Screenshot of `spatialHeatmap's` Shiny App. This the Spatial Heatmap component. The generated SHMs are one of the pre-uploaded examples.

Figure 12: Screenshot of spatialHeatmap's Shiny App
This the Spatial Heatmap component. The generated SHMs are one of the pre-uploaded examples.

After launching, the Shiny App displays by default one of the included data sets.

The gene expression data and aSVG image are uploaded to the Shiny App as tabular text (e.g. in CSV or TSV format) and SVG file, respectively. To also allow users to upload gene expression data stored in SummarizedExperiment objects, one can export them from R to a tabular file with the filter_data function. In this function call, the user sets the desired directory path under dir. Within this directory the tabular file will be written to local_mode_result/processed_data.txt in TSV format. The column names in the exported tabular file preserve the experimental design information from the colData slot by concatenating the corresponding sample and condition information separated by double underscores. An example of this format is shown in Table 1.

To interactively access gene- or transcript-level annotations in the plots and tables of the Shiny App, such as viewing functional descriptions by moving the cursor over network nodes, the corresponding annotation column needs to be present in the rowData slot and its column name assigned to the ann argument. In the exported tabular file the extra annotation column is appended to the expression matrix.

jianhai-reply: yes, that’s fine. 10 Check for correctness. For readability I made a lot of changes in this paragraph.

se.fil.arab <- filter_data(data=se.aggr.sh, ann="Target.Description", sam.factor='samples', con.factor='conditions', pOA=c(0.03, 6), CV=c(0.30, 100), dir='./')

6.2 Server Deployment

As most Shiny Apps, spatialHeatmap can be deployed as a centralized web service. A major advantage of a web server deployment is that the functionalities can be accessed remotely by anyone on the internet without the need to use R on the user system. For deployment one can use custom web servers or cloud services, such as AWS, GCP or shinysapps.io. An example web instance for testing spatialHeatmap online is available here.

Lastly, in either local or online deployment there is a potential computation issue if the tabular file contains too many rows such as more than 10,000. Specifically, too many rows would cause intensive computation in the network analysis and potentially results in system collapse. To prevent this potential issue, the Shiny App is added an option of local work mode. So users can get the intensive network analysis computed outside Shiny and then upload the computed results to the local mode. Note, in this case three files should be uploaded: filtered data matrix, adjacency matrix, module assignment, where the last two files should be generated on the first one. The details are documented in help files of function filter_data and adj_mod.

7 Supplement

To generate SHMs with custom data, proper formatting of the numeric (assay) data and aSVG images is required. A tabular target file describing the numeric data can also be provided but is optional. This section provides additional details on these three input types.

7.1 Numeric Data

The numceric data used to color the features in aSVG images can be provided as three different object types including vector, data.frame and SummerizedExperiment. When working with complex omics-based assay data then the latter provides the most flexibility, and thus should be the preferred container class for managing numeric data in spatialHeatmap. Both data.frame and SummarizedExperiment can hold data from many measured items, such as genes across many samples and conditions. In contrast to this, the vector class is only suitable for data from single items. Due to its simplicity this less complex container is often useful for testing or when dealing with simple data sets.

7.1.1 Object Types

7.1.1.1 vector

When using numeric vectors as input to spatial_hm, then their name slot needs to be populated with strings matching the feature names in the corresponding aSVG. To also specify conditions, their labels need to be appended to the feature names with double underscores as separator, i.e. ’feature__condition’.

The following example replots the toy example for two spatial features (‘occipital lobe’ and ‘parietal lobe’) and two conditions (‘1’ and ‘2’).

vec <- sample(x=1:100, size=5) # Random numeric values
names(vec) <- c('occipital lobe__condition1', 'occipital lobe__condition2', 'parietal lobe__condition1', 'parietal lobe__condition2', 'notMapped') # Assign unique names to random values
vec
## occipital lobe__condition1 occipital lobe__condition2 
##                         23                         84 
##  parietal lobe__condition1  parietal lobe__condition2 
##                         66                          2 
##                  notMapped 
##                         20

With this configuration the resulting plot contains two spatial heatmap plots for the human brain, one for ‘condition 1’ and another one for ‘contition 2’. To keep the build time and storage size of this package to a minimum, the spatial_hm function call in the code block below is not evaluated. Thus, the corresponding SHM is not shown in this vignette.

spatial_hm(svg.path=svg.hum, data=vec, ID='toy', ncol=1, legend.r=1.2, sub.title.size=14)

7.1.1.2 data.frame

Compared to the above vector input, data.frames are structured here like row-wise appended vectors, where the name slot information in the vectors is stored in the column names. Each row also contains a name that corresponds to the corresponding item name, such as a gene ID. The naming of spatial features and conditions in the column names follows the same conventions as the naming used for the name slots in the above vector example.

The following illustrates this with an example where a numeric data.frame with random numbers is generated containing 20 rows and 5 columns. To avoid name clashes, the values in the rows and columns should be unique.

df.test <- data.frame(matrix(sample(x=1:1000, size=100), nrow=20)) # Create numeric data.frame
colnames(df.test) <- names(vec) # Assign column names
rownames(df.test) <- paste0('gene', 1:20) # Assign row names
df.test[1:3, ]
##       occipital lobe__condition1 occipital lobe__condition2
## gene1                         63                        700
## gene2                        836                        731
## gene3                        785                        662
##       parietal lobe__condition1 parietal lobe__condition2
## gene1                       305                       582
## gene2                       383                       573
## gene3                       187                       238
##       notMapped
## gene1       770
## gene2       395
## gene3       934

With the resulting data.frame one can generate the same SHM as in the previous example, that used a named vector as input to the spatial_hm function. Additionally, one can now select each row by its name (here gene ID) under the ID argument.

spatial_hm(svg.path=svg.hum, data=df.test, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14)

Additional information can be appended to the data.frame column-wise, such as annotation data including gene descriptions. This information can then be displayed interactively in the network plots of the Shiny App by placing the curser over network nodes.

df.test$ann <- paste0('ann', 1:20)
df.test[1:3, ]
##       occipital lobe__condition1 occipital lobe__condition2
## gene1                         63                        700
## gene2                        836                        731
## gene3                        785                        662
##       parietal lobe__condition1 parietal lobe__condition2
## gene1                       305                       582
## gene2                       383                       573
## gene3                       187                       238
##       notMapped  ann
## gene1       770 ann1
## gene2       395 ann2
## gene3       934 ann3

7.1.1.3 SummarizedExperiment

The SummarizedExperiment class is a much more extensible and flexible container for providing metadata for both rows and columns of numeric data stored in tabular format.

To import experimental design information from tabular files, users can provide a target file that will be stored in the colData slot of the SummarizedExperiment (SE, M. Morgan et al. (2018)) object. In other words, the target file provides the metadata for the columns of the numeric assay data. Usually, the target file contains at least two columns: one for the features/samples and one for the conditions. Replicates are indicated by identical entries in these columns. The actual numeric matrix representing the assay data is stored in the assay slot, where the rows correspond to items, such as gene IDs. Optionally, additional annotation information for the rows (e.g. gene descriptions) can be stored in the rowData slot.

For constructing a valid SummarizedExperiment object, that can be used by the spatial_hm function, the target file should meet the following requirements.

  1. It should be imported with read.table or read.delim into a data.frame or the data.frame can be constructed in R on the fly (as shown below).

  2. It should contain at least two columns. One column represents the features/samples and the other one the conditions. The rows in the target file correspond to the columns of the numeric data stored in the assay slot. If the condition column is empty, then the same condition is assumed under the corresponding features/samples entry.

  3. The feature/sample names must have matching entries in to corresponding aSVG to be considered in the final SHM. Note, the double underscore is a special string that is reserved for specific purposes in spatialHeatmap, and thus should be avoided for naming feature/samples and conditions.

The following example illustrates the design of a valid SummarizedExperiment object for generating SHMs. In this example, the ‘occipital lobe’ tissue has 2 conditions and each condition has 2 replicates. Thus, there are 4 assays for occipital lobe, and the same design applies to the parietal lobe tissue.

sample <- c(rep('occipital lobe', 4), rep('parietal lobe', 4))
condition <- rep(c('condition1', 'condition1', 'condition2', 'condition2'), 2)
target.test <- data.frame(sample=sample, condition=condition, row.names=paste0('assay', 1:8))
target.test
##                sample  condition
## assay1 occipital lobe condition1
## assay2 occipital lobe condition1
## assay3 occipital lobe condition2
## assay4 occipital lobe condition2
## assay5  parietal lobe condition1
## assay6  parietal lobe condition1
## assay7  parietal lobe condition2
## assay8  parietal lobe condition2

The assay slot is populated with a 8 x 20 data.frame containing random numbers. Each column corresponds to an assay in the target file (here imported into colData), while each row corresponds to a gene.

df.se <- data.frame(matrix(sample(x=1:1000, size=160), nrow=20))
rownames(df.se) <- paste0('gene', 1:20)
colnames(df.se) <- row.names(target.test)
df.se[1:3, ]
##       assay1 assay2 assay3 assay4 assay5 assay6 assay7 assay8
## gene1    133     92    937    722    590     98    490    341
## gene2    912    953    902    525    475    560    516    615
## gene3    472     19    964    877    174    771    900    163

Next, the final SummarizedExperiment object is constructed by providing the numeric and target data under the assays and colData arguments, respectively.

se <- SummarizedExperiment(assays=df.se, colData=target.test)
se
## class: SummarizedExperiment 
## dim: 20 8 
## metadata(0):
## assays(1): ''
## rownames(20): gene1 gene2 ... gene19 gene20
## rowData names(0):
## colnames(8): assay1 assay2 ... assay7 assay8
## colData names(2): sample condition

If needed row-wise annotation information (e.g. for genes) can be included in the SummarizedExperiment object as well. This can be done during the construction of the initial object, or as below by injecting the information into an existing SummarizedExperiment object.

rowData(se) <- df.test['ann']

In this simple example, possible normalization and filtering steps are skipped. Yet, the aggregation of replicates is performed as shown below.

se.aggr <- aggr_rep(data=se, sam.factor='sample', con.factor='condition', aggr='mean')
## Syntactically valid column names are made!
assay(se.aggr)[1:3, ]
##       occipital.lobe__condition1 occipital.lobe__condition2
## gene1                      112.5                      829.5
## gene2                      932.5                      713.5
## gene3                      245.5                      920.5
##       parietal.lobe__condition1 parietal.lobe__condition2
## gene1                     344.0                     415.5
## gene2                     517.5                     565.5
## gene3                     472.5                     531.5

With the fully configured SummarizedExperiment object, a similar SHM is plotted as in the previous examples.

spatial_hm(svg.path=svg.hum, data=se.aggr, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14)

7.2 aSVG File

7.2.1 aSVG repository

An aSVG repository, that can be used by spatialHeatmap directly, has been generated by the EBI Gene Expression Group. It contains annatomical aSVG images from different species. These SVG images are also used by the Expression Atlas database. In addition, the spatialHeatmap has its own repository called spatialHeatmap aSVG Repository, where some aSVG files developed in this project are already deposited (e.g. Figure 8).11 It is confusing that you are not mentioning here your custom aSVG repos.

If users cannot find a target aSVG in the two repositories, we have developed a step-by-step SVG tutorial for creating custom aSVG images. For example, the BAR eFP browser at University of Toronto contains many anatomical images, and these images can be used as templates in the SVG tutorial to make custom aSVGs.

We will add more aSVGs to our repository in the future and users are welcome to deposit their own aSVGs there to share with other spatialHeatmap users.

7.2.2 Update aSVG features

To create and edit existing feature identifiers in aSVGs, the update_feature function can be used. The demonstration below, creates an empty folder ~/test1 and copies into it the homo_sapiens.brain.svg aSVG image provided by the spatialHeatmap package. Subsequently, selected feature annotations are updated in this file.

if (!dir.exists('~/test1')) dir.create('~/test1') # Create an empty directory
svg.hum <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap") 
file.copy(from=svg.hum, to='~/test1', overwrite=FALSE) # Copy "homo_sapiens.brain.svg" file into '~/test1'

Query the above aSVG with feature and species keywords, and return the resulting matches in a data.frame.

feature.df <- return_feature(feature=c('frontal cortex'), species=c('homo sapiens', 'brain'), dir='~/test1', remote=FALSE, keywords.any=FALSE)
feature.df

Subsequently, create a character vector of new feature identifiers corresponding to each of the returned features. In the following examples spaces in strings will be filled with dots. This character vector must be added to the first column of the feature data.frame. The latter is used by the update_feature function to look up new features.

Sample code that creates new feature names and stores them in a character vector.

f.new <- c('frontal.cortex', 'prefrontal.cortex')

Next, new features are added to the first column of the feature data.frame.

feature.df.new <- cbind(featureNew=f.new, feature.df)
feature.df.new

Finally, the features are updated in the aSVG file(s) located in the ~/test1 directory.

update_feature(feature=feature.df.new, dir='~/test1')


# Version Informaion
sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices
##  [7] utils     datasets  methods   base     
## 
## other attached packages:
##  [1] spatialHeatmap_0.99.0       igraph_1.2.5               
##  [3] reshape2_1.4.4              flashClust_1.01-2          
##  [5] genefilter_1.70.0           data.table_1.12.8          
##  [7] WGCNA_1.69                  fastcluster_1.1.25         
##  [9] dynamicTreeCut_1.63-1       ggdendro_0.1-20            
## [11] gridExtra_2.3               rsvg_2.1                   
## [13] grImport_0.9-3              XML_3.99-0.3               
## [15] DT_0.13                     visNetwork_2.0.9           
## [17] plotly_4.9.2.1              shinydashboard_0.7.1       
## [19] shiny_1.4.0.2               ggplot2_3.3.0              
## [21] GEOquery_2.56.0             ExpressionAtlas_1.16.0     
## [23] xml2_1.3.2                  limma_3.44.1               
## [25] SummarizedExperiment_1.18.1 DelayedArray_0.14.0        
## [27] matrixStats_0.56.0          Biobase_2.48.0             
## [29] GenomicRanges_1.40.0        GenomeInfoDb_1.24.0        
## [31] IRanges_2.22.1              S4Vectors_0.26.1           
## [33] BiocGenerics_0.34.0         knitr_1.28                 
## [35] BiocStyle_2.16.0            nvimcom_0.9-25             
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.7        rols_2.16.1           
##   [3] Hmisc_4.4-0            plyr_1.8.6            
##   [5] lazyeval_0.2.2         splines_4.0.0         
##   [7] crosstalk_1.1.0.1      BiocParallel_1.22.0   
##   [9] digest_0.6.25          foreach_1.5.0         
##  [11] htmltools_0.4.0        magick_2.3            
##  [13] GO.db_3.11.1           gdata_2.18.0          
##  [15] magrittr_1.5           checkmate_2.0.0       
##  [17] memoise_1.1.0          cluster_2.1.0         
##  [19] doParallel_1.0.15      readr_1.3.1           
##  [21] annotate_1.66.0        prettyunits_1.1.1     
##  [23] jpeg_0.1-8.1           colorspace_1.4-1      
##  [25] blob_1.2.1             xfun_0.13             
##  [27] dplyr_0.8.5            crayon_1.3.4          
##  [29] RCurl_1.98-1.2         jsonlite_1.6.1        
##  [31] impute_1.62.0          survival_3.1-12       
##  [33] iterators_1.0.12       glue_1.4.1            
##  [35] gtable_0.3.0           zlibbioc_1.34.0       
##  [37] XVector_0.28.0         scales_1.1.1          
##  [39] DBI_1.1.0              edgeR_3.30.0          
##  [41] Rcpp_1.0.4.6           viridisLite_0.3.0     
##  [43] xtable_1.8-4           progress_1.2.2        
##  [45] htmlTable_1.13.3       gridGraphics_0.5-0    
##  [47] foreign_0.8-79         bit_1.1-15.2          
##  [49] preprocessCore_1.50.0  Formula_1.2-3         
##  [51] htmlwidgets_1.5.1      httr_1.4.1            
##  [53] gplots_3.0.3           RColorBrewer_1.1-2    
##  [55] acepack_1.4.1          ellipsis_0.3.1        
##  [57] farver_2.0.3           pkgconfig_2.0.3       
##  [59] nnet_7.3-14            locfit_1.5-9.4        
##  [61] labeling_0.3           ggplotify_0.0.5       
##  [63] tidyselect_1.1.0       rlang_0.4.6           
##  [65] later_1.0.0            AnnotationDbi_1.50.0  
##  [67] munsell_0.5.0          tools_4.0.0           
##  [69] RSQLite_2.2.0          fastmap_1.0.1         
##  [71] evaluate_0.14          stringr_1.4.0         
##  [73] yaml_2.2.1             bit64_0.9-7           
##  [75] caTools_1.18.0         purrr_0.3.4           
##  [77] mime_0.9               compiler_4.0.0        
##  [79] rstudioapi_0.11        curl_4.3              
##  [81] png_0.1-7              tibble_3.0.1          
##  [83] geneplotter_1.66.0     stringi_1.4.6         
##  [85] highr_0.8              lattice_0.20-41       
##  [87] Matrix_1.2-18          vctrs_0.3.0           
##  [89] pillar_1.4.4           lifecycle_0.2.0       
##  [91] BiocManager_1.30.10    bitops_1.0-6          
##  [93] httpuv_1.5.2           R6_2.4.1              
##  [95] latticeExtra_0.6-29    bookdown_0.19         
##  [97] promises_1.1.0         KernSmooth_2.23-17    
##  [99] codetools_0.2-16       MASS_7.3-51.6         
## [101] gtools_3.8.2           assertthat_0.2.1      
## [103] DESeq2_1.28.1          withr_2.2.0           
## [105] GenomeInfoDbData_1.2.3 hms_0.5.3             
## [107] rpart_4.1-15           tidyr_1.0.3           
## [109] rmarkdown_2.1          rvcheck_0.1.8         
## [111] base64enc_0.1-3

8 Funding

This project has been funded by NSF awards: PGRP-1546879, PGRP-1810468, PGRP-1936492.

References

Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9.

Chang, Winston, and Barbara Borges Ribeiro. 2018. Shinydashboard: Create Dashboards with ’Shiny’. https://CRAN.R-project.org/package=shinydashboard.

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. n.d. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.

Davis, Sean, and Paul Meltzer. 2007. “GEOquery: A Bridge Between the Gene Expression Omnibus (GEO) and BioConductor.” Bioinformatics 14: 1846–7.

Gatto, Laurent. 2019. Rols: An R Interface to the Ontology Lookup Service. http://lgatto.github.com/rols/.

Gautier, Laurent, Leslie Cope, Benjamin M. Bolstad, and Rafael A. Irizarry. 2004. “Affy—analysis of Affymetrix GeneChip Data at the Probe Level.” Bioinformatics 20 (3). Oxford, UK: Oxford University Press: 307–15. doi:10.1093/bioinformatics/btg405.

Huber, W., V. J. Carey, R. Gentleman, S. An ders, M. Carlson, B. S. Carvalho, H. C. Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis Wit H Bioconductor.” Nature Methods 12 (2): 115–21. http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html.

Keays, Maria. 2019. ExpressionAtlas: Download Datasets from EMBL-EBI Expression Atlas.

Langfelder, Peter, and Steve Horvath. 2008. “WGCNA: An R Package for Weighted Correlation Network Analysis.” BMC Bioinformatics 9 (December): 559.

———. 2012. “Fast R Functions for Robust Correlations and Hierarchical Clustering.” Journal of Statistical Software 46 (11): 1–17. http://www.jstatsoft.org/v46/i11/.

Langfelder, Peter, Bin Zhang, and with contributions from Steve Horvath. 2016. DynamicTreeCut: Methods for Detection of Clusters in Hierarchical Clustering Dendrograms. https://CRAN.R-project.org/package=dynamicTreeCut.

Lekschas, Fritz, Harald Stachelscheid, Stefanie Seltmann, and Andreas Kurtz. 2015. “Semantic Body Browser: Graphical Exploration of an Organism and Spatially Resolved Expression Data Visualization.” Bioinformatics 31 (5): 794–96.

Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12): 550. doi:10.1186/s13059-014-0550-8.

Maag, Jesper L V. 2018. “Gganatogram: An R Package for Modular Visualisation of Anatograms and Tissues Based on Ggplot2.” F1000Res. 7 (September): 1576.

Merkin, Jason, Caitlin Russell, Ping Chen, and Christopher B Burge. 2012. “Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues.” Science 338 (6114): 1593–9.

Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2018. SummarizedExperiment: SummarizedExperiment Container.

Mungall, Christopher J, Carlo Torniai, Georgios V Gkoutos, Suzanna E Lewis, and Melissa A Haendel. 2012. “Uberon, an integrative multi-species anatomy ontology.” Genome Biol. 13 (1): R5. doi:10.1186/gb-2012-13-1-r5.

Muschelli, John, Elizabeth Sweeney, and Ciprian Crainiceanu. 2014. “BrainR: Interactive 3 and 4D Images of High Resolution Neuroimage Data.” R J. 6 (1): 41–48.

Mustroph, Angelika, M Eugenia Zanetti, Charles J H Jang, Hans E Holtan, Peter P Repetti, David W Galbraith, Thomas Girke, and Julia Bailey-Serres. 2009. “Profiling Translatomes of Discrete Cell Populations Resolves Altered Cellular Priorities During Hypoxia in Arabidopsis.” Proc Natl Acad Sci U S A 106 (44): 18843–8.

Papatheodorou, Irene, Nuno A Fonseca, Maria Keays, Y Amy Tang, Elisabet Barrera, Wojciech Bazant, Melissa Burke, et al. 2018. “Expression Atlas: gene and protein expression across multiple studies and organisms.” Nucleic Acids Res. 46 (D1): D246–D251. doi:10.1093/nar/gkx1158.

Prudencio, Mercedes, Veronique V Belzil, Ranjan Batra, Christian A Ross, Tania F Gendron, Luc J Pregent, Melissa E Murray, et al. 2015. “Distinct Brain Transcriptome Profiles in C9orf72-Associated and Sporadic ALS.” Nat. Neurosci. 18 (8): 1175–82.

Ravasz, E, A L Somera, D A Mongru, Z N Oltvai, and A L Barabási. 2002. “Hierarchical Organization of Modularity in Metabolic Networks.” Science 297 (5586): 1551–5.

Robinson, Mark D, Davis J McCarthy, and Gordon K Smyth. 2010. “EdgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” Bioinformatics 26 (1): 139–40. doi:10.1093/bioinformatics/btp616.

Waese, Jamie, Jim Fan, Asher Pasha, Hans Yu, Geoffrey Fucile, Ruian Shi, Matthew Cumming, et al. 2017. “EPlant: Visualizing and Exploring Multiple Levels of Data for Hypothesis Generation in Plant Biology.” Plant Cell 29 (8): 1806–21.

Winter, Debbie, Ben Vinegar, Hardeep Nahal, Ron Ammar, Greg V Wilson, and Nicholas J Provart. 2007. “An ‘Electronic Fluorescent Pictograph’ Browser for Exploring and Analyzing Large-Scale Biological Data Sets.” PLoS One 2 (8): e718.